CLAIMS 



WHAT IS CLAIMED IS: 

5 1 . An apparatus for performing computations comprising: 
a chaining controller; 
a plurality of computational devices; 

wherein a first chaining subset of the plurality of computational devices includes 
at least two of the plurality of computational devices; and 
1 0 wherein the chaining controller is configured to instruct the first chaining subset 

to operate as a first computational chain. 

2. The apparatus of Claim 1, wherein the plurality of computational devices comprises 
exponentiators, whereby the first computational chain comprises a first exponentiation 

15 chain. 

3. The apparatus of Claim 2, further comprising a hardware state controller for each 
exponentiator of the first exponentiation chain, wherein each hardware state controller 
includes replicated fanout control logic. 

20 

4. The apparatus of Claim 3, wherein the replicated fanout control logic is configured to 
allow exponentiators of the first exponentiation chain to chain without delay due to high 
fanout. 

25 5 . The apparatus of Claim 3, wherein the replicated fanout control logic is configured 
such that state machines of the first exponentiation chain sequence efficiently. 
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6. The apparatus of Claim 2, 

wherein each exponentiator further comprises a custom multiplier datapath; and 
wherein each custom multiplier datapath is configured so that the length of its 
longest wire is short. 

5 

7. The apparatus of Claim 6, wherein the custom multiplier datapaths of chained 
exponentiators are physically mirrored to each other so that the wire length between the 
two is short. 

10 8. The apparatus of Claim 6, wherein the custom multiplier datapath has a serpentine 

layout so that the wire length between the most separated adjacent data locations is short. 

O 9. The apparatus of Claim 2, wherein the number of exponentiators in the plurality of 

|0 exponentiators equals 2 ? wherein k is a nonnegative integer. 

B 15 

si; s 

rU 1 0. The apparatus of Claim 9, wherein k equals 2. 

| *f 11. The apparatus of Claim 9, wherein each exponentiator is adapted to exponentiate a 

111 512-bit number. 

8 20 

12. The apparatus of Claim 2, wherein the number of exponentiators in the 
exponentiation chain equals 2 k , wherein k is a positive integer. 
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13. The apparatus of Claim 12, further comprising: 
a second exponentiation chain; 

wherein a second chaining subset of the plurality of exponentiators includes at 

least two of the plurality of exponentiators; 
5 wherein the chaining controller is configured to instruct the second chaining 

subset to operate as a second exponentiation chain, 
wherein no exponentiator of the first exponentiation chain is part of the second 

exponentiation chain. 

10 14. The apparatus of Claim 2, wherein each exponentiator further comprises: 
a cleave/merge engine; 
h wherein the cleave/merge engine is configured to: 

^ receive AA, which is a 2w-bit number; 

CO calculate Ai and A 2 , which are two w-bit numbers based on AA; and 

I 15 output Ai and A 2 ; 

1 y wherein the cleave/merge engine is also configured to: 

p receive Bi and B 2 , which are two w-bit numbers; 

12 calculate BB, which is a 2w-bit number based on Bi and B 2 ; and 

output BB; 

fy 20 wherein exponentiation of AA yields BB; 

wherein exponentiation of Ai yields Bi; 
wherein exponentiation of A2 yields B 2 ; and 
wherein w is a positive integer. 

25 15. The apparatus of Claim 14, wherein Ai and A 2 are calculated from AA, and BB is 
calculated from Bi and B 2 , using a scalable Chinese Remainder Theorem 
implementation. 
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16. The apparatus of Claim 15, 

wherein each exponentiator is adapted to perform 1024-bit exponentiation; 
wherein, if 2048 -bit exponentiation is required, the chaining controller causes the 

first exponentiation chain to comprise two exponentiators; and 
wherein, if 4096-bit exponentiation is required, the chaining controller causes the 

first exponentiation chain to comprise four exponentiators. 

17. A system for computing comprising: 

a computing device; 

at least one apparatus of Claim 1; and 

wherein the computing device is configured to use the apparatus of Claim 1 to 
perform computations. 
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18. A method for performing computations comprising: 

loading argument X into session memory; 
loading argument K into session memory; 
cleaving X mod P to compute X P ; 
cleaving X mod Q to compute Xq; 
exponentiating Xp to compute C P ; 
exponentiating Xq to compute Cq; 
merging Cp and Cq to compute C; and 
retrieving C from the session memory. 

19. The method of Claim 18, further comprising: 

selecting one session controller of 32 available session controllers; 

setting the busy bit for the one session controller; 

wherein the argument X is a 1024-bit number; 

wherein C is a 1024-bit number; and 

clearing the busy bit for the one session controller. 
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20. The method of Claim 18, further comprising: 

selecting two session controllers of 32 available session controllers; 
setting the busy bits for the two session controllers 
wherein loading argument X into session memory includes: 

loading part of the argument X into the session memory of one of the two 
session controllers; 

loading the remainder of the argument X into the session memory of the 
other of the two session controllers; 

wherein the argument X is a 2048-bit number; 
wherein C is a 2048-bit number; and 
clearing the busy bits for the two session controllers. 

21. The method of Claim 18, further comprising: 

selecting four session controllers of 32 available session controllers; 
setting the busy bits for the four session controllers 
wherein loading argument X into session memory includes: 

loading a first part of the argument X into the session memory of a first of 

the four session controllers; 
loading a second part of the argument X into the session memory of a 

second of the four session controllers; 
loading a third part of the argument X into the session memory of a third 

of the four session controllers; 
loading the remaining of the argument X into the session memory of a 

fourth of the four session controllers; 
wherein the argument X is a 4096-bit number; 
wherein C is a 4096-bit number; and 
clearing the busy bits for the four session controllers. 
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22. The method of Claim 18, 

wherein the cleaving X mod P comprises: 
setting A[513:0] =X[1023:510]; 

calculating Z[1026:0] = A[513:0] x nP[512:0], wherein uP[512] = 1; 

setting B[513:0] =Z[1026:512]; 

setting C[5 1 3 :0] = X[5 13:0]; 

calculating Y[1025:0] = B[513:0] x P[511:0]; 

setting D[513:0]=Y[513:0]; 

calculating E[513:0] = C[513:0] - D[513:0]; 

if E > P then calculating E = E - P; 

ifE>PthenE = E-P;and 

setting X P = E[51 1 :0] as the result of the cleaving X mod P, whereby X P 
equals X mod P; and 
wherein the cleaving X mod Q comprises: 
setting A[513:0] = X[1023:510]; 

calculating Z[l 026:0] =A[5 13:0] x uQ[512:0], wherein uQ[512] = 1; 

setting B[513:0] = Z[1026:512]; 

setting C[513:0]=X[513:0]; 

calculating Y[1025:0] =B[513:0] x Q[511:0]; 

setting D[513:0]=Y[513:0]; 

calculating E[513:0] = C[513:0] - D[513:0]; 

if E > Q then calculating E = E - Q; 

if E > Q then E = E - Q; and 

setting X Q = E[5 1 1 :0] as the result of the cleaving X mod Q, whereby X Q 
equals X mod Q. 
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23. The method of Claim 18, wherein merging Cp and Cq to compute C comprises: 
if C P > P then calculating C P = C P - P; 
if Cq > Q then calculating Cq = Cq - Q; 
calculating A[512:0] = C Q [511:0] - C P [511:0]; 
5 if A < 0 then calculating A[5 1 1 :0] = A[5 11:0]+ Q[5 1 1 :0] ; 

calculating B[1023:0] = A[51 1:0] x F ! [511:0]; 

calculating D[51 1:0] = Cleave B[1023:0] mod Q[511:0], wherein uQ[512] = 1; 
calculating E[1023:0] = D[51 1:0] * P[511:0]; 
calculating C[1023:0] =E[1023:0] + Cp[511:0]; and 
10 wherein C[1023:0] is the result of merging Cp and Cq. 
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